{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "http://blog.csdn.net/itplus/article/details/37969817"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "统计语言模型<br>\n",
    "$W=w_1^T:=(w_1,...,w^T)$是由$T$个词按顺序组成的句子<br>\n",
    "$$p(W)=p(w_1^T)=p(w_1,...,w_T)$$\n",
    "按照Bayes公式，上述公式可链式分解为\n",
    "$$p(w_1^T)=p(w_1)p(w_2|w_1)...p(w_T|w_1^T)$$\n",
    "再有n-gram模型的假设，n=2时\n",
    "$$p(w_k|w_1^{k-1})\\approx p(w_k|w_{k-2+1}^{k-1})\\approx \\frac{count(w_{k-1},w_k)}{count(w_{k-1})} $$\n",
    "再对n-gram模型进行抽象：\n",
    "$$Context(w_i)=w_{i-n+1}^{i-1}$$\n",
    "那么上述句子的概率变成一定条件下一个词的概率：\n",
    "$$p(w|Context(w))$$\n",
    "按照优化问题的套路，可以再对乘积项抽象：\n",
    "$$p(w|Context(w))=F(w,Context,\\theta)$$\n",
    "那么下一步就是考虑合理构造函数$F$，这是word2vec的基础。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "HS模型，CBOW结构\n",
    "节点图： \n",
    "![节点图](http://bucket-lz.oss-cn-beijing.aliyuncs.com/%E8%8A%82%E7%82%B9%E5%9B%BE.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "word.code的初始化，生成哈夫曼树<br>\n",
    "model.syn1的初始化，非叶节点（参数）向量初始化<br>\n",
    "model.wv.syn0的初始化，词向量初始化<br>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对应关系<br>\n",
    "l1--:$x_w^T$<br>\n",
    "l2a--:$\\theta_{1..(l^{w}-1)}^w$<br>\n",
    "fa--:$\\frac{1}{1-e^{\\theta_{1..(l^{w}-1)}^wx_w}}$<br>\n",
    "ga--:(1-word.code-fa)$\\eta$<br>\n",
    "neu1a--:$e$<br>\n",
    "model.syn1[word.point]--:$\\theta_{1..l^w-1}^w$<br>\n",
    "model.wv.syn0[input_word_indices]--:$v(context(w))$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "l2a = model.syn1[word.point] \n",
    "def update_weights(self):\n",
    "    self.syn1 = vstack([self.syn1, zeros((gained_vocab, self.layer1_size), dtype=REAL)])\n",
    "def reset_weights(self):\n",
    "    self.syn1 = zeros((len(self.wv.vocab), self.layer1_size), dtype=REAL)\n",
    "\n",
    "self.wv = KeyedVectors()\n",
    "class KeyedVectors(utils.SaveLoad):\n",
    "#Class to contain vectors and vocab for the Word2Vec training class and other w2v methods not directly involved in training such as most_similar()\n",
    "    def __init__(self):\n",
    "        self.syn0 = []\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "目标函数的提出\n",
    "$$L=\\sum_{w\\in C}p(w|Context(w))$$\n",
    "有了哈夫曼树的结构，就能把CBOW模式写成\n",
    "$$p(w|Context(w))=\\prod_{j=2}^{l^w}p(d_j|x_w,\\theta_{j-1}^w)$$\n",
    "其中\n",
    "\\begin{equation}\n",
    "p(d_j|x_w,\\theta_{j-1}^w)=\\left\\{\n",
    "\\begin{aligned}\n",
    "&\\sigma(x_w^T\\theta_{j-1}^w),&d_j^w=0\\\\\n",
    "&1-\\sigma(x_w^T\\theta_{j-1}^w),&d_j^w=1\n",
    "\\end{aligned}\n",
    "\\right.\n",
    "\\end{equation}\n",
    "或者写在一起：\n",
    "$$p(d_j|x_w,\\theta_{j-1}^w)=[\\sigma(x_w^T\\theta_{j-1}^w)]^{1-d_j^w}\\cdot [1-\\sigma(x_w^T\\theta_{j-1}^w)]^{d_j^w}$$\n",
    "再利用对数似然，写出目标函数\n",
    "$$L=\\sum_{w\\in C}\\sum_{j=2}^{l^w}L(w,j)$$\n",
    "$$L(w,j)=(1-d_j^w)log[\\sigma(x_w^T\\theta_{j-1}^w)]+d_j^w log[1-\\sigma(x_w^T\\theta_{j-1}^w)]$$\n",
    "所以\n",
    "\\begin{equation}\n",
    " L(w,j)=\\left\\{\n",
    "\\begin{aligned}\n",
    " & log[\\sigma(x_w^T\\sigma_j^w)] ,&d_j=0\\\\\n",
    " & log[1-\\sigma(x_w^T\\sigma_j^w)]\\approx log[-\\sigma(x_w^T\\sigma_j^w) ,&d_j=1\n",
    "\\end{aligned}\n",
    "\\right.\n",
    "\\end{equation}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def score_cbow_pair(model, word, l1):\n",
    "    l2a = model.syn1[word.point]  # 2d matrix, codelen x layer1_size\n",
    "    sgn = (-1.0)**word.code  # ch function, 0-> 1, 1 -> -1\n",
    "    lprob = -logaddexp(0, -sgn * dot(l1, l2a.T))\n",
    "    return sum(lprob)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "随机梯度下降<br>\n",
    "各个参数的更新公式如下：\n",
    "$$\\frac{\\partial L(w,j)}{\\partial \\theta_{j-1}^{w}}=[1-d_j^w-\\sigma(x^T_w\\theta_{j-1}^w)]x_w$$\n",
    "从而$\\theta_{j-1}^w$的更新公式为\n",
    "$$\\theta_{j-1}^w:=\\theta_{j-1}^w+\\eta[1-d_j^w-\\sigma(x_w^T\\theta_{j-1}^w)]x_w$$\n",
    "对词向量求偏导\n",
    "$$\\frac{\\partial L(w,j)}{\\partial x_{w}}=[1-d_j^w-\\sigma(x^T_w\\theta_{j-1}^w)]\\theta_{j-1}^w$$\n",
    "注意到$x_w$是$Context(w)$中各个词的累加，如何利用$\\frac{\\partial L(w,j)}{\\partial x_{w}}$更新$v(\\hat{w})$，这里只进行简单处理，直接取：\n",
    "$$v(\\hat{w}):=v(\\hat{w})+\\eta\\sum_{j=2}^{l^w}\\frac{\\partial L(w,j)}{\\partial x_w},\\hat{w}\\in Context(w)$$\n",
    "当然也可以取平均值处理，在gensim中有专门是否取平均的选项model.cbow_mean默认为False。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "从二分类的角度对每一个非叶子节点的左右孩子指定类别\n",
    "$$Label(p_i^w)=1-d_i^w,i=2,3,...,l^w$$\n",
    "1 &nbsp; $(-1)^1$ &nbsp; 左边 &nbsp; 负类 &nbsp;  $1-\\sigma(x_w^T\\theta)$ &nbsp; 权值大<BR>\n",
    "0 &nbsp; $(-1)^0$ &nbsp; 右边 &nbsp; 正类 &nbsp;  $\\sigma(x_w^T\\theta)$   &nbsp; 权值小"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "CBOW算法核心步骤:<br>\n",
    "1,$e=0$ <br>\n",
    "2,$x_w=\\sum_{u\\in Context(w)}v(u)$<br>\n",
    "3,For j=2:$l^w$<br>\n",
    "  3.1 $f_j=\\sigma(x_w^T\\theta_{j-1}^w)$<br>\n",
    "  3.2 $g_j=\\eta(1-d_j^w-f_j)$<br>\n",
    "  3.3 $e=e+g_j\\theta^w_{j-1}$<br>\n",
    "  3.4 $\\theta_{j-1}^w:=\\theta_{j-1}^w+g_jx_w$<br>\n",
    "4,For $u\\in Context(w)$:<br>\n",
    "&nbsp;&nbsp;&nbsp;&nbsp;&emsp;&emsp;$v(u):=v(u)+e$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#考察词汇的前置节点（那几个非叶子节点）对应的向量\n",
    "l2a = model.syn1[word.point]  # 2d matrix, codelen x layer1_size\n",
    "prod_term = dot(l1, l2a.T)\n",
    "fa = expit(prod_term)  # propagate hidden -> output\n",
    "#word.code  是在哈夫曼树中的前置节点d_j（比如有10个）在当前哈夫曼节点的代号  0 或者  1 \n",
    "ga = (1. - word.code - fa) * alpha  # vector of error gradients multiplied by the learning rate\n",
    "#说的是3.3和3.4不能交换，但是这里并不是交换，而且是可以写到下面的\n",
    "if learn_hidden:\n",
    "    model.syn1[word.point] += outer(ga, l1)  # learn hidden -> output\n",
    "neu1e += dot(ga, l2a)  # save error\n",
    "#更新词向量\n",
    "if learn_vectors:\n",
    "    # learn input -> hidden, here for all words in the window separately\n",
    "    if not model.cbow_mean and input_word_indices:\n",
    "        neu1e /= len(input_word_indices)\n",
    "    for i in input_word_indices:\n",
    "        model.wv.syn0[i] += neu1e * model.syn0_lockf[i]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "if compute_loss:\n",
    "    sgn = (-1.0)**predict_word.code  # `ch` function, 0 -> 1, 1 -> -1\n",
    "    lprob = -log(expit(-sgn * prod_term))\n",
    "    model.running_training_loss += sum(lprob)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<ol>\n",
    "<li>Red</li>\n",
    "<li>Green</li>\n",
    "<li>Blue</li>\n",
    "</ol>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
